26 research outputs found

    Contributions for improving debugging of kernel-level services in a monolithic operating system

    Get PDF
    Alors que la recherche sur la qualité du code des systèmes a connu un formidable engouement, les systèmes d exploitation sont encore aux prises avec des problèmes de fiabilité notamment dus aux bogues de programmation au niveau des services noyaux tels que les pilotes de périphériques et l implémentation des systèmes de fichiers. Des études ont en effet montré que chaque version du noyau Linux contient entre 600 et 700 fautes, et que la propension des pilotes de périphériques à contenir des erreurs est jusqu à sept fois plus élevée que toute autre partie du noyau. Ces chiffres suggèrent que le code des services noyau n est pas suffisamment testé et que de nombreux défauts passent inaperçus ou sont difficiles à réparer par des programmeurs non-experts, ces derniers formant pourtant la majorité des développeurs de services. Cette thèse propose une nouvelle approche pour le débogage et le test des services noyau. Notre approche est focalisée sur l interaction entre les services noyau et le noyau central en abordant la question des trous de sûreté dans le code de définition des fonctions de l API du noyau. Dans le contexte du noyau Linux, nous avons mis en place une approche automatique, dénommée Diagnosys, qui repose sur l analyse statique du code du noyau afin d identifier, classer et exposer les différents trous de sûreté de l API qui pourraient donner lieu à des fautes d exécution lorsque les fonctions sont utilisées dans du code de service écrit par des développeurs ayant une connaissance limitée des subtilités du noyau. Pour illustrer notre approche, nous avons implémenté Diagnosys pour la version 2.6.32 du noyau Linux. Nous avons montré ses avantages à soutenir les développeurs dans leurs activités de tests et de débogage.Despite the existence of an overwhelming amount of research on the quality of system software, Operating Systems are still plagued with reliability issues mainly caused by defects in kernel-level services such as device drivers and file systems. Studies have indeed shown that each release of the Linux kernel contains between 600 and 700 faults, and that the propensity of device drivers to contain errors is up to seven times higher than any other part of the kernel. These numbers suggest that kernel-level service code is not sufficiently tested and that many faults remain unnoticed or are hard to fix bynon-expert programmers who account for the majority of service developers. This thesis proposes a new approach to the debugging and testing of kernel-level services focused on the interaction between the services and the core kernel. The approach tackles the issue of safety holes in the implementation of kernel API functions. For Linux, we have instantiated the Diagnosys automated approach which relies on static analysis of kernel code to identify, categorize and expose the different safety holes of API functions which can turn into runtime faults when the functions are used in service code by developers with limited knowledge on the intricacies of kernel code. To illustrate our approach, we have implemented Diagnosys for Linux 2.6.32 and shown its benefits in supporting developers in their testing and debugging tasks.BORDEAUX1-Bib.electronique (335229901) / SudocSudocFranceF

    Patch-CLIP: A Patch-Text Pre-Trained Model

    Full text link
    In recent years, patch representation learning has emerged as a necessary research direction for exploiting the capabilities of machine learning in software generation. These representations have driven significant performance enhancements across a variety of tasks involving code changes. While the progress is undeniable, a common limitation among existing models is their specialization: they predominantly excel in either predictive tasks, such as security patch classification, or in generative tasks such as patch description generation. This dichotomy is further exacerbated by a prevalent dependency on potentially noisy data sources. Specifically, many models utilize patches integrated with Abstract Syntax Trees (AST) that, unfortunately, may contain parsing inaccuracies, thus acting as a suboptimal source of supervision. In response to these challenges, we introduce PATCH-CLIP, a novel pre-training framework for patches and natural language text. PATCH-CLIP deploys a triple-loss training strategy for 1) patch-description contrastive learning, which enables to separate patches and descriptions in the embedding space, 2) patch-description matching, which ensures that each patch is associated to its description in the embedding space, and 3) patch-description generation, which ensures that the patch embedding is effective for generation. These losses are implemented for joint learning to achieve good performance in both predictive and generative tasks involving patches. Empirical evaluations focusing on patch description generation, demonstrate that PATCH-CLIP sets new state of the art performance, consistently outperforming the state-of-the-art in metrics like BLEU, ROUGE-L, METEOR, and Recall

    Network structure of social coding in GitHub

    Get PDF
    Abstract—Social coding enables a different experience of software development as the activities and interests of one developer are easily advertized to other developers. Developers can thus track the activities relevant to various projects in one umbrella site. Such a major change in collaborative software development makes an investigation of networkings on social coding sites valuable. Furthermore, project hosting platforms promoting this development paradigm have been thriving, among which GitHub has arguably gained the most momentum. In this paper, we contribute to the body of knowledge on social coding by investigating the network structure of social coding in GitHub. We collect 100,000 projects and 30,000 developers from GitHub, construct developer-developer and project-project relationship graphs, and compute various characteristics of the graphs. We then identify influential developers and projects on this subnetwork of GitHub by using PageRank. Understanding how developers and projects are actually related to each other on a social coding site is the first step towards building tool supports to aid social programmers in performing their tasks more efficiently. I

    Adoption of Software Testing in Open Source Projects: A Preliminary Study on 50,000 Projects

    Get PDF
    Abstract—In software engineering, testing is a crucial activ-ity that is designed to ensure the quality of program code. For this activity, development teams spend substantial resources constructing test cases to thoroughly assess the correctness of software functionality. What is however the proportion of open source projects that include test cases? What kind of projects are more likely to include test cases? In this study, we explore 50,000 projects and investigate the correlation between the presence of test cases and various project development characteristics, including the lines of code and the size of development teams. Keywords-Empirical study, Software testing, Adequacy, Test case

    An empirical study of adoption of software testing in open source projects

    Get PDF
    Abstract—In software engineering, testing is a crucial ac-tivity that is designed to ensure the quality of program code. For this activity, software teams spend substantial resources constructing test cases to thoroughly assess the correctness of software functionality. What is the proportion of open source projects that include test cases? What is the effect of number of developers on the number of test cases? In this study, we explore open source projects and investigate the correlation between the presence of test cases and various project development characteristics, including the number of lines of code, the size of development teams and the quantity of bug reports. The results show that projects with test cases are bigger in size and projects with bigger team sizes have higher number of test cases. However, surprisingly, number of test cases has a weak correlation with the number of bugs. Keywords-Empirical study, Software testing, Adequacy, Test case

    Orion: A software project search engine with integrated diverse software artifacts

    Get PDF
    Abstract—Software projects produce a wealth of data that is leveraged in different tasks and for different purposes: researchers collect project data for building experimental datasets; software programmers reuse code from projects; developers often explore the opportunities for getting involved in the development of a project to gain or offer expertise. Finding relevant projects that suit one needs is however currently challenging with the capabilities of existing search systems. We propose Orion, an integrated search engine architecture that combines information from different types of software repositories from multiple sources to facilitate the construction and execution of advanced search queries. Orion provides a declarative query language that gives to users access to a uniform interface where it transparently integrates different artifacts of project development and maintenance, such as source code information, version control systems metadata, bug tracking systems elements, and metadata on developer activities and interactions extracted from hosting platforms. We have built an extensible system with an initial capability of over 100,000 projects collected from the web, featuring several types of software repositories and software development artifacts. We conducted an experiment with 10 search scenarios to compare Orion with traditional search engines, and explore the need for our approach as well as the productivity of the proposed infrastructure. The results show with strong statistical significance that users find relevant projects faster and more accurately with Orion. I

    Patch-CLIP : A Patch-Text Pre-Trained Model

    Get PDF
    In recent years, patch representation learning has emerged as a necessary research direction for exploiting the capabilities of machine learning in software generation. These representations have driven significant performance enhancements across a variety of tasks involving code changes. While the progress is undeniable, a common limitation among existing models is their specialization: they predominantly excel in either predictive tasks, such as security patch classification, or in generative tasks such as patch description generation. This dichotomy is further exacerbated by a prevalent dependency on potentially noisy data sources. Specifically, many models utilize patches integrated with Abstract Syntax Trees (AST) that, unfortunately, may contain parsing inaccuracies, thus acting as a suboptimal source of supervision. In response to these challenges, we introduce PATCH-CLIP, a novel pre-training framework for patches and natural language text. PATCH-CLIP deploys a triple-loss training strategy for 1) patch-description contrastive learning, which enables to separate patches and descriptions in the embedding space, 2) patch-description matching, which ensures that each patch is associated to its description in the embedding space, and 3) patch-description generation, which ensures that the patch embedding is effective for generation. These losses are implemented for joint learning to achieve good performance in both predictive and generative tasks involving patches. Empirical evaluations focusing on patch description generation, demonstrate that PATCH-CLIP sets new state of the art performance, consistently outperforming the state-of-the-art in metrics like BLEU, ROUGE-L, METEOR, and Recall

    Popularity, interoperability, and impact of programming languages in 100,000 open source projects

    Get PDF
    Abstract—Programming languages have been proposed even before the era of the modern computer. As years have gone, computer resources have increased and application domains have expanded, leading to the proliferation of hundreds of programming languages, each attempting to improve over others or to address new programming paradigms. These languages range from procedural languages like C, objectoriented languages like Java, and functional languages such as ML and Haskell. Unfortunately, there is a lack of large scale and comprehensive studies that examine the “popularity”, “interoperability”, and “impact ” of various programming languages. To fill this gap, this study investigates a hundred thousands of open source software projects from GitHub to answer various research questions on the “popularity”, “interoperability ” and “impact ” of various languages measured in different ways (e.g., in terms of lines of code, development teams, issues, etc.). Keywords-Programming languages; Popularity; Interoperability; Open source; Software projects; GitHu

    Data-driven Simulation and Optimization for Covid-19 Exit Strategies

    Get PDF
    The rapid spread of the Coronavirus SARS-2 is a major challenge that led almost all governments worldwide to take drastic measures to respond to the tragedy. Chief among those measures is the massive lockdown of entire countries and cities, which beyond its global economic impact has created some deep social and psychological tensions within populations. While the adopted mitigation measures (including the lockdown) have generally proven useful, policymakers are now facing a critical question: how and when to lift the mitigation measures? A carefully-planned exit strategy is indeed necessary to recover from the pandemic without risking a new outbreak. Classically, exit strategies rely on mathematical modeling to predict the effect of public health interventions. Such models are unfortunately known to be sensitive to some key parameters, which are usually set based on rules-of-thumb.In this paper, we propose to augment epidemiological forecasting with actual data-driven models that will learn to fine-tune predictions for different contexts (e.g., per country). We have therefore built a pandemic simulation and forecasting toolkit that combines a deep learning estimation of the epidemiological parameters of the disease in order to predict the cases and deaths, and a genetic algorithm component searching for optimal trade-offs/policies between constraints and objectives set by decision-makers. Replaying pandemic evolution in various countries, we experimentally show that our approach yields predictions with much lower error rates than pure epidemiological models in 75% of the cases and achieves a 95% R2 score when the learning is transferred and tested on unseen countries. When used for forecasting, this approach provides actionable insights into the impact of individual measures and strategies
    corecore